This page contains all data used in Nature publication "Gene expression divergence recapitulates the developmental hourglass".
Contents |
1 Raw data |
2 Probe to gene mappings (FlyBase 5.14) |
3 Normalization, Averaging and Scaling |
4 Correlation Analysis |
5 Probe Analysis: GC content and overlaps |
Species | Replicate |
01 0-2h |
02 2-4h |
03 4-6h |
04 6-8h |
05 8-10h |
06 10-12h |
07 12-14h |
08 14-16h |
09 16-18h |
10 18-20h |
11 20-22h |
12 22-24h |
13 24-26h |
D.mel | 1 | x | x | x | x | x | x | x | x | | | | | |
Dmel.zip | 2 | x | x | x | x | x | x | x | x | | | | | |
| 3 | x | x | x | x | x | x | x | x | | | | | |
| 4 | x | x | x | x | x | x | x | x | | | | | |
| 5 | x | x | x | x | x | x | x | x | x | x | | | |
| 6 | | | | | | x | x | x | x | x | | | |
| 7 | | | | | | x | x | x | x | | | | |
| 8 | | | | | | x | x | x | x | x | | | |
D.sim | 1 | x | x | x | x | x | x | x | x | x | | | | |
Dsim.zip | 2 | x | x | x | x | x | x | x | x | x | | | | |
| 3 | x | x | x | x | x | x | x | x | x | | | | |
D.ana | 1 | x | x | x | x | x | x | x | x | x | | | | |
Dana.zip | 2 | x | x | x | x | x | x | x | x | x | | | | |
| 3 | x | x | x | x | x | x | x | x | x | | | | |
D.pse | 1 | x | x | x | x | x | x | x | x | x | | | | |
Dpse.zip | 2 | x | x | x | x | x | x | x | x | x | | | | |
| 3 | x | x | x | x | x | x | x | x | x | | | | |
D.per | 1 | x | x | x | x | x | x | x | x | x | | | | |
Dper.zip | 2 | x | x | x | x | x | x | x | x | x | | | | |
| 3 | x | x | x | x | x | x | x | x | x | | | | |
D.vir | 1 | x | x | x | x | x | x | x | x | x | x | | | |
Dvir.zip | 2 | x | x | x | x | x | x | x | x | x | x | x | x | x |
| 3 | x | x | x | x | x | x | x | x | x | x | x | x | x |
This file lists genes intended as targets by probes and the genes that those probes actually map to based on Flybase 5.14.
Normalization, Averaging and Scaling
Nr. | File description | File link | notes |
1 | all species, replicates, genes and probes raw | All species replicate probelog10.txt.zip | log10 of the gProcessedSignal from Agilent datafiles |
2 | all species, replicates, genes and probes quantile normalized per timepoint | All species replicate quantilenormalized probelog10.txt.zip | for each timepoint separetely the x (usually 3) replicates for one species were quantile normalized (bringing the comparable distributions of signals across timepoints to a common distribution). Quantile normalization was perfomed on raw gProcessedSignal data and then logged. |
3 | all species, genes and probes averaged across replicates | All species avgprobe log10.txt.zip | Data from 2 were averaged across replicates (rep1+rep2+rep3/3). Again, data were logged only after averaging for output purposes. |
4 | all species and genes averaged across probes & row-normalized | All species gene rownorm.txt.zip | Data from 3 were logged, averaged per-timepoint across probes using Tukey Biweight Average (removing outliers) and row-normalized (i.e. converted to deviations from the mean centered on 0) |
5 | all species and genes scaled | All species gene scaled relative to amel.txt.zip |
Scaling factors relative to amel, estimated from the data, were applied to the row-normalized gene data from 4. This is an input for correlation analysis. |
6 | all species, replicates, genes and probes quantile normalized and scaled | All species replicate probelog10 quantileNormalized scaled.txt.zip |
Data from 2 were scaled according to precomputed scaling factors. That is quantile normalized data for each probe, gene, replicate and species were scaled (for dmel the data do not change compared to row 2 as the scaling factor=1). This is an input for ANOVA analysis. |
We calculated pair-wise (all species combinations) correlation coefficients from the scaled, row-normalized gene profiles (All pairwise correlations.zip).
Nr. | File description | File link | notes |
1 | Probes: entropy, quality, blast, GC%. | Allprobes GC content.zip | A list of all the probes together with their sequences, entropy, blast and quality scores, and their GC percentage. |
2 | Overlaps between probes within genes. | Overlaps basepairs bygene.zip | Overlaps in base-pairs for all pairwise comparisons between probes within genes (4 per gene since all species have the same overlap scores for their probes). |
3 | Relationship between GC content and expression intensity. | GC intens scatter hist.zip | A plot showing the relationship between GC content and mean expression intensity for all probes. |
4 | Relationship between GC content contrasts and p-values. | GC pval.zip | A plot showing the relationship between GC content contrasts between species and the associated bootstrapped p-values. |
5 | GC content for species-specific probes. | GC species box.zip | A plot showing the GC content for probes within each species. The plot shows that simulans has the highest GC content and virilis the lowest. |
6 | Relationship between GC variance within and GC variance between species. | GC within between.pdf.zip | This plot shows that there is a slight tendency for probes with low variance within species to show high variance between species and vice versa. |
7 | Distribution of probe overlaps for all pairwise probe comparisons. | Probe overlap all hist.zip | Histogram showing the distribution of the fraction of probe overlap for all 6 pairwise comparisons for the 4 probes per gene. |
8 | Distribution of probe overlaps for neighbouring probe comparisons. | Probe overlap hist.zip | Histogram showing the distribution of the fraction of probe overlap for neighbouring probes. |
9 | Relationship between probe overlap and variance in GC content between species. | Overlap varGCbetween.zip | This plot shows the relationship between probe overlap and variance in GC content between species showing that more overlapping probes tend to have more variance between species. |